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ON IMPORTANCE SAMPLING WITH MIXTURES FOR 
RANDOM WALKS WITH HEAVY TAILS 

"Os 

Q , HENRIK HULT AND JENS SVENSSON 

O 

■ Abstract. Importance sampling algorithms for heavy-tailed random walks 

C^h' are considered. Using a specification with algorithms based on mixtures of 

O , the original distribution with some other distribution, sufficient conditions 

i for obtaining bounded relative error are presented. It is proved that mixture 

algorithms of this kind can achieve asymptotically optimal relative error. Some 
examples of mixture algorithms are presented, including mixture algorithms 
using a scaling of the original distribution, and the bounds of the relative errors 
are calculated. The algorithms are evaluated numerically in a simple setting. 



Oh 

^ ' 1. Introduction 

Tail probabilities appear naturally in many applications of probability theory, 
and often analytical evaluation is not possible. For many applications, Monte Carlo 
simulation can be an effective alternative. For rare events, however, standard Monte 
Carlo simulation is very computationally inefficient, and some form of variance 
reduction method is necessary. One such alternative that has been extensively 
applied to both light- and heavy-tailed distributions is importance sampling. In this 
, paper we focus on importance sampling algorithms for computing the probability 

Pb = P{S n >b), 

of a high threshold b, for a random walk S n = Xi + • ■ ■ + X n . The random variables 



> 



On 

' X%,..., X n are independent and identically distributed with distribution function 

F and density /. It is assumed that the right tail of / is regularly varying at oo; 
more precisely there exists an a > such that, for each x > 0, 



>< 



5-H 



,. f(ux) 
am 



u — >oo 



/(«) 

Then it is well known that / has the representation f(x) = x~ a ^ 1 L(x), x > 0, 
where L is slowly varying. The joint distribution of (X\, . . . , X n ) is denoted n n . 

Consider first a computation of pb using standard Monte Carlo. Then N inde- 
pendent samples {X\, . . . , X*), . . . , (Xf , . . . , X^) are generated from [i n and pb is 
estimated using the sample frequency 

! N 

Pb 



N ■ 

i=l 



where S l n = X\ + ■ ■ ■ + X l n . For large b, the event {S n > b} is rare and few of the 
indicator variables > b} will be 1. This leads to rather inefficient estimation. 
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To see this, consider for instance the standard deviation of p\ lc ■ An elementary 
calculation shows 



Stdev(p 6 ) = — L=v^(i - 



N ■ ' Pb) 



When pt, is small this is roughly y/Pb/N. Hence, it would require N « sam- 
ples to have the standard deviation of size comparable to the quantity pb we are 
estimating. When pb is small this can be very large. 

Importance sampling provides a way to possibly reduce the computational cost 
without sacrificing precision, or equivalently to improve precision without increas- 
ing the computational cost. The basic idea of importance sampling to generate 
samples (Xl, . . . , X^), ■ ■ ■ , {X^ , . . . , X„) independently from a sampling measure 
v\ instead of \i n . It is assumed that [i n is absolutely continuous with respect to v^, 
written fx n <C v h n so that the Radon-Nikodym derivative -Jrjg- exists. An unbiased 
estimate of pb is constructed as 

1 N A 

p b = ^Y.^ X u...,X n )I{S n >b}. 

i=i 71 

The goal is to choose v b n so more samples are drawn from regions that are "impor- 
tant" to the event {S n > b}. Then the event becomes less rare under which 
reduces variance. However, v\ must be chosen carefully so that the Radon-Nikodym 
weights ^f-(Xi, . . . , X n ) do not cause variance to increase. A relevant quantity for 
deciding if a sampling measure v\ is appropriate or not is the relative error 




REiPb) = ~Ep~ b = 

By Jensen's inequality we always have Ep^ 

To quantify the efficiency of the sampling measure it is convenient to study 
the asymptotics of the relative error as b — ► oo. This amounts to studying the 
asymptotics of normalized second moment lim^oo Ep^/p^. We say that a sampling 
distribution i/£ has logarithmically efficient relative error if, for some e > 0, 

, Eft 
hmsup 2 _ < oo, 

6— «x> p b 

it has bounded relative error if 

hmsup — ^- < oo, 

b^oo Pb 

and asymptotically optimal relative error if 

r E % i 
hm sup — — = 1 . 

b^oc Pb 

A number of different algorithms have been proposed to simu l ate ta il probabil- 
ities of heavy-tailed random walks. lAsmussen and Binswanger ( 1997f) study the 



class of subexponential distributions, i.e. distributions for which 

hm > I = 1, 

fc^oo nP(X 1 > b) 
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and use that as b — > oo , all the variance of the sum comes from the largest summand. 
By removing the largest term X/ n \ in each sample and calculating the probability 
using the remaining n — 1 terms, they obtain a logarithmically efficient conditional 
Monte Carlo estimator in the sub-class of distributions with regularly varying tails. 
Here Xm < -X~(2) < ... < Xt n \ is the ordered sample. Specifically, with the sample 
X\ , . . . , X n , the estimator is 



P(S n >b\X {1) ,...,X (n _ 1} ) 



F(X (n _D V(6 -S( n -p)) 



where St n —\\ = Xm + . . . X( n _u is the sum of the n — 1 largest of the n terms in 
th e sample and a V b = maxfa b) . 

Asmussen and Kroese ( 20061 ) propose a similar idea, using the conditioning 



nP(S n >b,M n = X n \X l ,...,X n - l ) = F(M n _i V (b - S (Ti _i))), 

where M n = max(A"i, . . . , X n ) to obtain an estimator with bounded relative error 
fo r distributions with regula r ly var ying tails. 

Junej a and Shahabuddinl ( 2002h introduce an importance sampling algorithm 



with similar structure to exponential twisting in the light-tailed case. Their so- 
called hazard rate twisting of the original distribution is given by 



dF e (x) 



e 9K ^dF{x) 
J °° e B ^dF(x) 



(1.1) 



where < 6 < 1 and A(x) = — logF(x) is the hazard rate. For distributions 
with regularly varying tails, this is equivalent to changing the tail index of the 
distribution. 

In the case of importance sampling, Ba ssamboo et al. ( 2007 ) show that to obtain 
efficient sampling distributions for heavy-tailed random walks, one must consider 
state-dependent changes of measure. Simply changing the parameters in the original 
distribution cannot lead to an estimator with bounded relative error. 

The first a l gorith m of this type, for heavy-tailed random walks, was proposed by 
Dupuis et al.1 (|2007l ). There the large values are sampled from the conditional dis- 
tribution where one has to condition on exceeding a level just below the remaining 
distance to b. The authors prov e that their propo s ed alg orithm has close to asymp- 
totically optimal relative error. iBlanchet and Lil (|2008l ) present a state-dependent 
algorithm that uses Markov chain description of the random walk under the sam- 
pling meas ure to obtain bounded re lative error for the class of subexponential dis- 
tributions. Blanchet and Liu ( 2008f ) construct a mixture algorithm with bounded 
relative error for the large deviation probability P(S n > b) where b > bon 1 / 2+e . 

In t his paper we take & more general look at mixture algorithms of the same 
type as iDupuis et al.1 (|2007l ) . The underlying idea is to construct a dynamic change 
of measure such that the trajectories of Xi, . . . , X n leading to S n > b is similar to 
the most likely trajectories conditional on S n > b. In the heavy-tailed case, the 
most likely trajectories are such that one of the Xi's is large and the others are 
"average". Mixtures arise quite naturally as sampling distributions for producing 
such trajectories; with some probability pi sample from the original density / and 
with probability qn = 1 — pi sample from a density where it is likely to get a large 
value. We provide sufficient conditions for bounded relative error and provide a 
couple of new examples that are very easy to implement. We also show that, with 
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some additional work, one can construct mixture algorithms with asymoptotically 
optimal relative error. 

The paper is organized as follows. In Section 2 we present a general importance 
sampling algorithm based on mixtures and provide several examples. In Section 3 
we provide sufficient condition for the mixture algorithm to have bounded relative 
error. In Section 4 we provide detailed analysis of specific mixture algorithms. The 
concluding Section 5 provides a proof that it is possible to obtain asymptotically 
optimal relative error. 



2. Dynamic mixture algorithms 

In this section we describe a general importance sampling algorithm based on 
mixtures, called the dynamic mixture algorithm, and provide several examples. 

The dynamic mixture algorithm for computing pi, = P(S n > b) proceeds as 
follows. Each replication of (X±, . . . ,X n ) is generated dynamically and the distri- 
bution for sampling Xi depend on the current state S^-i = X\ + ■ • • + Xj_i of the 
random walk. At the ith step it may be that Si-i already exceeds the threshold b. 
Then Xi is sampled from the original density /. Otherwise, if Si-i < b, a biased 
coin is tossed with probability pi for "heads" and = 1 — pi for "tails" . If it comes 
up "heads" Xi is generated from the usual density /, but if it comes up "tails", Xi 
is generated from another density gi(x | S^-i). The density gi(x \ Si-i) depends 
on the current generation i of the algorithm and on the current position Si—i. The 
idea is to choose g$(x | Si-i) s.t. sampling from gi(x \ Si-i) is likely to result in 
a large variable. However, gi(x | must be chosen with some care to control 

the Radon- Nikodym weights ^^-(Xi, . . . ,X n ) and thereby the relative error. In 
the last generation, if S n -i < b, X n is sampled from a density g n {x \ S n -i) and 
if S n -i > b it is sampled from the original /. In contrast to the previous steps g n 
is not necessarily of mixture type. The reason is that it may be advantageous to 
make sure X n > b — 5„_i in the last step to get S n > b. 

A precise description of the dynamic mixture algorithm is presented next. 

Algorithm 1. Consider step i — 1, . . . , n, where Si-i = Sj_i. Then Xi is sampled 
as follows. 

• If s^i > 6, Xi is sampled from the original density /, 

• if Si_i < b, Xi is sampled from 

Pi f(-) + qi9i{- I Si-i), for 1 < i < n - 1, 
9n{- I s„_i), for i = n. 

Here pi + qi = 1 and pi 6 (0, 1). 

Explicit examples of the dynamic mixture algorithm are obtained by specifying 
gi and pi. 



Example 2 .1 (Conditional mix ture, c.f. Dupuis et al. ( 20071 )). The algorithm 



proposed by iDupuis et al. ( 2007 ) takes gi to be a conditional distribution. For 



i = 1, . . . , n — 1 the large values are sampled conditional on being at least a times 
the remaining distance to 6, where a G (0, 1). It is important that a < 1. In the 
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last step samples are generated conditional on exceeding b. More precisely, 

, . . f(x)I{x > a(b — s)\ 

gAx a) = — — — , Ki<n-1, 

yK 1 ; F(a(b-s)) ~ ~ 

f(x)I{x>b-s} 

q n (x s) — = . 

y y 1 ' F{b-s) 

In their paper the authors assume that / = on (— oo, 0). That is, all the Xi's are 
non-negative. This is not an important restriction and we do not impose it here. 

A practical limitation of the conditional mixture algorithm is that some dis- 
tributions do not allow direct sampling from the conditional distribution. If the 
distribution function F and its inverse F^ are available, the inversion method sug- 
gest sampling X conditional on X > c by taking U to be un iform on (0,1) and 



set X = F*~(l — UF(x)), see e.g. lAsmussen and Glvnnl (|2007l ). In other cases it 



might be necessary to use an acceptance-rejection method, but this may be time 
consuming. 

A simple alternative to the conditional mixture is to sample the large variables 
from a generalized Pareto distribution (GPD) instead. The intuition is that the 
GPD approximates the conditional distribution well. 

Example 2.2 (Generalized Pareto mixture). The GPD mixture algorithm takes 
gi to be a generalized Pareto distribution. As in the previous algorithm, for i — 
1, . . . , n — 1, the large values are sampled conditional on being at least a times the 
remaining distance to b, where a E (0, 1). The last step is slightly different. If the 
remaining distance is large, the last step is taken from a GPD, otherwise it is taken 
from the original density. This is because, if S n -i < b, but close to b, the GPD is 
not necessarily a good approximation of the conditional distribution. To be precise, 

g l (x | s) = a[a(b - s)] a x^ a ^ 1 I{x > a(b - s)}, 1 < i < n - 1. 
g n (x | s) = a(b - s) a X - a - x I{x > b - s}I{s < b - 6(1 - a)"" 1 } 
+ f(x)I{s>b-b{l-a) n - 1 }. 

A different way to sample the large variables is to sample from the original 
density and then scale the outcome by simply multiplying with a large number Xb. 
We call this a scaling mixture algorithm. 

Example 2.3 (Scaling mixtures). The scaling mixture algorithm has, with A > 0, 

9l (x | s) = (\b)-\f(x/Xb)I{x > 0} + f(x)I{x < 0}, i = 1, . . . , n - 1, 

g n (x | s) = (Xb^fix/X^Iix > 0, s < b - 6(1 - a)"" 1 } 

+ f(x)I{x < or s > 6 - 6(1 - a)"" 1 }. 

To simplify the analysis we will, in the context of scaling mixtures, always assume 
that the orginal density / is strictly positive on (0, oo). If this is not satisfied the 
situation is more involved because there may be large x > such that f(x) > 
but f{x/Xb) = 0. Then such large x-value cannot be obtained by sampling a small 
number from / and scale by A6. This may cause the Radon-Nikodym weights to 
be relatively large, which increase the variance. 

There are several variations of the scaling algorithm. For instance, one may scale 
with something proportional to the remaining distance to 6, instead of something 
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proportional to b as described above. Some variations of the scaling algorithm will 
be treated in more detail in Section 



3. Asymptotic analysis of the normalized second moment 

The efficiency criteria presented in the introduction are all based on the asymp- 
totic properties of the normalized secon d moment Efr j /pj. We are following the 
weak convergence approach initiated by iDupuis et al.l ( 2007 ) to study its asymp- 



totics. By the subexponential property, pi ~ n 2 F(b) 2 1 where a& ~ c b denotes 
limb— >oo flb/cf, = 1, the normalized second moment can be written as 

lr ~ I M {vW{dy) = h I W) d ^ n [hy)mh{dvl (3 - 1} 

s n >b s„>l 

where the measure raj = /!„(&(• n {s n > ^}))/F(b)- To calculate the limit of 
this integral we will use the weak convergence of the measure mj to a measure m 
and uniform convergence of an upper bound Rl(y) > ^^^^t{by) ='■ Rb{y) to a 
bounded continuous function R(y). Then we (?) establish the convergence 

limsup / Rbdrrib < lim / Rldrrib = / Rdm. 



b-foo J b^oo J J 

{s n >l} {s„>l} {s n >l} 

To do this it is convenient if the normalized Radon-Nikodym derivative Rb{y) is 
bounded. This criteria is certainly stronger than necessary but appears to be de- 
sirable. It implies the the normalized q-moment is asymptotically bounded for any 
q G (1, oo). Indeed, if R£ is bounded and R* — > R uniformly, then for any q G (1, oo) 

limsup^- = limsup^- f (=^^-(by)Y m b {dy) < \ [ R"' 1 dm < ac. 

S„>1 

Next we provide sufficient conditions for Rb to be bounded. 

Lemma 3.1. Consider Algorithm^ with pi > for 1 < i < n — 1. Suppose there 
exists a G (0, 1) such that 

liminf inf I hs ^ p( b \ > q l<{< n (3.2) 

b^co ,<!_(!_ )*-i f{by) 

y > a(l - s) 

limsup sup — - — — — < oo. (d-d) 

b^oo s <i g n {by\bs) 

y > 1 - s 

Then the scaled Radon-Nikodym derivative R b (y) = "p^~3^t(by) * s bounded on 

{yi + --- + y n > l}. 

Proof. Let s n = y\ + • • • + y n . On {s„ > 1} it must hold that > a(l — Si_i) for 
some i = 1, . . . , n. Otherwise Sj < 1 — (I — a) 1 < 1 for each i. 

Take y G {y € K™ : s„ > 1} and let z = min{j : y.j > a(l — Sj_i)}. Note that for 
this i 

y t > o(l - Si_i) > o(l - a) i_1 > o(l - a) n =: a„ > 0. 
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For any y h j g {i,n}, 



f(bVi) < 1 



P.if( f >!l,) + Qj9jQ>Vj I bsj-i) pj ' 
It follows that, for 1 < i < n — 1, 

1 dp (b v < 1 f(by l )I{y l > a(l - gj-i)} x tt J_ 

f{bVn)I{y n > 1 - gn-l} Tf ^iX^TS vUi r- l', 
77 TT s I{S n -l < 1} + i{Sn-l > 1} ■ 3.4) 

g n (by n I &s„_i) 
The first term can be written as 

1 f{by i )I{y l > o(l - Si_i)} _ /{y, > o(l - s;_i)} 



F(6) Pi/(&w) + qi9i(byi | 6s,_i) + ?i 8i W;j- 1 » F(t) ' 

By (|3.2[) this term in (|3 .4|) is bounded. The second term is bounded because Pj > 
by assumption. The last term is bounded by (|3.3[) . 
Similarly for i = n, 

1 dfi , , f{by n )I{y n > a(l - s„_i)} ™ 



\ < f{by n )I{yn > a(l ~ -pr 1_ 



F(6) t&>£ " | bs n ^)F(b) f = \ Pj 

which is bounded by (|3.2[) . □ 

Next we present the main result. It provides sufficient conditions for the mixture 
algorithms to have bounded relative error. This is obtained by showing that the 
normalized second moment remains bounded. 

Theorem 3.2. Suppose (|3.2[) and (|3.3p hold for a £ (0,1). Suppose in addition 
that there exist continuous functions hi : R™ — > [0, oo) such that 

77 MW I (3-5) 

3i(02/i I bsi-x)k (b) 

uniformly on {y € R™ : Sj_i < 1 — (1 — a) 1 " 1 , yi > a(l — s)}. Then, 
wii/i i/ie convention that q n = 1. 

Proof. First rewrite the normalized second moment as in (13. 1| : 

\ ^^(by)m b (dy) = ± J R b (y)m b (dy). 

S„>1 

By regular variation of / and independence of X\, . . . ,X n the joint distribution 
fj, n is multivariate regularly varying. In particular the weak convergence m b A m 
holds, where m has the representation 

n p 

m{A) =J2 I{y E R" : Vl > 1, Vj =0,j? ijay^dy,. (3.6) 

~1 J A 
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This is well known , see e.g. Resnickl (1982), Section 5.5. A proof is also given by 



Dupuis et al.l (|2007l ). We see that the measure m puts all its mass on the coordinate 



axes. That is, on trajectories where one jump is large and the rest are zero. 
The next step is to decompose the integral as 

Rbdmi, = / Rbdmi, + / Rbdmb, (3-7) 

J A J A c 

where A — U" =1 Ai is a finite union and the A^s have disjoint closures. We will 
find Ai such that the second integral converges to and determine an upper bound 
R* b > Rb on Ai. 

Define the sets Aj to be 

At = {y e M™ : % < o(l - Sj-x) for 1 < j < i - 1,^ > 1 - Si-i, 

and Sfc > 1, A; = i + 1, . . . , n}. 

Note that the A,'s have disjoint closure and m(dAi) = 0. In particular rrib(Ai) — > 
m(Ai) for each i = 1, . . . , n. Moreover, m(C]™ =1 Af) = 0. Indeed, 

n 

m(C^ =1 AD = m({s n > 1} \ U(Ai) = m{s n > 1} - E m(Ai) 

n p 

= J2 / I{y e R" : Vi >l, yj = 0,jjt i}ay- a ~ 1 dy l 

n . 

- E / e R " : V* > X > % = °' ^ ^r^dVi = 0. 
~~7 JA- 



By Lemma IBTTl i?b is bounded and since m,b(A c ) — ► m(A c ) = 0, the second integral 
in (|3.7[) converges to 0. For the first integral we construct a function i?£ that 
dominates Rb on A and a continuous function R such that Rl —> R uniformly on 
A. Then it follows from weak convergence that 



limsup / Rbdrrib < lim / R^dnib = / Rdm < oo. 



For y E Ai 



p / x s 1 f(byi)I{Vi > 1 - Sj-i} -pr 1 , . 

Rb{y) < 777 ; , 77 T7 C I I ~ = : #bU/)- 

F(b) Pifybyi) + qigi(byi \ &Si-i) ^ 

To see this, construct a bound as in (|3 .4[) and notice that on A,, > 1 for each 
k > i. Then the contribution to the Radon-Nikodym weights from yk, k > i is 
equal to 1. By assumption (13. 5|) 

/(%) I s i-i)' 

uniformly on Ai. For y G A 4 define R(y) = hi( yi | Si-i)]]^ £J^- Tnen R t ^ R 
uniformly on A. With the representation (|3.6p of the limiting measure m, the upper 
bound for the normalized second moment can now be calculated as 

If 1 n l ~ 1 1 1 f°° 

/ M - = ^2 E II -- / i o)otfr a_1 di/i- 

" J A 11 „■_•, „■_-, Pi Hi J \ 



□ 



IMPORTANCE SAMPLING FOR RANDOM WALKS WITH HEAVY TAILS 



9 



4. Examples 

In this section we provide a detailed analysis of the algorithms presented in 
Section O In particular we verify the conditions of Lemma 13.11 and Theorem 
for these algorithms. 



4.1. The conditional mixture algorithm. Recall from Example 12.11 that the 
conditional mixture algorithm has, with a € (0, 1), 

, . , fix) I{x > aib — s)} „ ^ . „ 

gdx a) = — — , , — — , K«<n-1, 

yn 1 ; F(a(b-s)) > - - 

f(x) I{x > b - s} 
q n (x s) — = . 

y y 1 ; F(b-s) 

Then, for i = 1,. .. ,n — 1, the uniform convergence F(bx)/ F(b) — > x~ a , for a; > 
xo > 0, implies 

gi( f & f'f 8) ^) = =j^-:I{x > a(l - «)} - a"(l - > a(l - «)}, 

/(te) F(&a(l - s)) 

uniformly for s < 1 — (1 — a) , a; > a(l — s). Similarly, 

uniformly on s < 1 — (1 — a)™ -1 , x > 1 — s, and 

-^=T(6(1 -*))<!, 
g n (oa; | bs) 

on s < 1. It follows that both (|3.2p and (|3.3p are satisified and hence the normalized 
Radon-Nikodym derivative is bounded. 

By the above calculation (]3 . 5|) holds with hi(y \ s) — a~ a (l — s)~ a , 1 < i < n— 1 
and h n (y \ s) = (1 — s)~ Q . It follows from Theorem 13.21 that 

Hm%<i n 1 +n 1 )- 

The right hand side is minimized at 

(n-i- l)a- Q / 2 + l , x 

(n — i)a a l 1 + 1 

with minimum n~~ 2 [(n — \)aT a / 2 + l] 2 , a nd it is possib l e to sh ow that the limit is 
equal to the right hand side of (|4.ip . see iDupuis et al.1 (|2007l ). Lemma 3.2.1. For 
each n this can be made arbitrarily close to 1 by choosing a close to 1. 



4.2. Generalized Pareto mixture. Recall from Example 12.21 that the GPD mix- 
ture algorithm has, with a € (0, 1), 

g,(x | s) = a[a(b - s^x'^Iix > a(b - s)}, 1 < i < n - 1. 
g n (x | s) = a(b - s) a x- a - 1 J{x > b - s}I{s < 6 — 6(1 — a)"" 1 } 
+ f(x)I{s >b-b{l-a) n - 1 }. 
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First we check Q3.2p and (13. 3|) . Karamata's theorem implies aF(b) ~ bf(b). Then, 
for any s < 1, 



9l {bx | bs) - = afc)-"- 1 ^!-)))"^) 
/(M 1 J _ /(to) 

aF(6a;) a a (l - s) a F(b) 



a a (l-s) a . (4.3) 



bxf(bx) x a F(bx) 
uniformly for x > a(l — s). In particular (|3 . 2[) is satisfied. Since 

m) - ^/{y>l- S }/{ S <l-(l-an 



5n(6y I bs) a(l - s) Q y 

+ 7{s > 1 - (1 - a) 11 - 1 } 

is bounded on s < 1, y > 1 — s, (|3 . 3[) also holds. By Lemma [3.11 the normalized 
Radon-Nikodym derivative is bounded. By the arguments above (|3.5I) holds with 
hi(y | s) = a~ Q (l - s)- Q , 1 < i < n - 1 and h n (y \ s) = (1 - s)~ a . It follows by 
Theorem [331 that 

This is identical to (|4.ip . so can be chosen according to (|4.2|) to minimize the 
relative error. 

4.3. Scaling mixtures. In the scaling mixture algorithm presented in Example 
12.31 the large variables are generated by sampling from the original density and 
multiplying with a large number. In this section we study some variations of this 
algorithm. Recall that, in the context of scaling mixtures, always assume that the 
orginal density / is strictly positive on (0, oo). 

The first scaling mixture algorithm, called scaling mixture I, is constructed as 
follows. Write f(x) = x^ a - 1 L{x) with L slowly varying. Suppose \xii x>XQ L(x) =: 

> for some xo > 0. The scaling mixture algorithm, with A > 0, has 

g,{x | s) = {Xb^fix/Xb^ix > 0} + f{x)I{x < 0}, i = 1, . . . , n - 1, 
g n (x | s) = (Xb)-\f(x/Xb)I{x > 0, s < b - 6(1 - a)"- 1 } 
+ f(x)I{x < or s > b - 6(1 - a)"" 1 }. 

To generate a sample X from gi proceed as follows. Generate a candidate X' from 
/. If X' < put X = X' and if X' > 0, put X =_XbX' . 

Take a £ (0, 1), using Karamata's theorem, aF(b) ~ bf(b), we have, for 1 < i < 
n, and s < 1 — (1 — a)*" 1 , 

_./(f )rfW w I{I>0}+J , (l)f)I , 0) 



aA bxf(bx) F(6a;) 

r_ 

X 



x«+V(f) 



a A 
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uniformly for x > 1 - s. Since x a+1 f(x/X) > X a+1 L* > 0, the condition ([3T2]) 
holds. Note, however, that (13.211 fails if = 0. Since 



/( /,,■ ) _ Xbf(bx) I{x>Qs < 1 _ {1 _ a) n-l } + j {x < + s > 1 _ (1 _ a) n-l } 



g n {bx | bs) /(§ 

is bounded on s < 1, x > 1 — s condition (|3.3[) also holds. From the calculation 
above we see that (|3.5I) is satisfied with /i(a: | s) = aX[x a+1 f (x / X)]^ 1 . In particular, 
the asymptotic upper bound for the normalized second moment is 

^ r ^ 1,00 9 n i i— i 



,2 



r i r°° rt 2 i i 



with q n = 1. It is straightforward to check that SiLi ^~ llj=i ^~ i s minimized 
at 

Pi = 1 ^— — r, Si = 1 - Pi, 

n — z + 1 

with minimum equal to 1. The parameter A can be chosen to control the factor 
In some cases this can be minimized analytically. 

Example 4.1. Consider a Pareto density of the form f(x) = a(l + x)~ a ~ 1 , x > 0. 
Then 

a--/ „, a2 , .dx=\-*> r*( i -^Y +i dx. 



/a x 2 («+D/(x) A/a 

If a = 1 this is minimized at A = \/3 with minimum 2+ J^ ■ 

In the scaling mixture algorithm we assume > 0. This rules out distributions 
whose slowly varying function tends to 0. However, this is not a severe problem. 
One way to avoid it is to slightly modify the previous algorithm. The scaling 
mixture II algorithm has, with A > 0, u G (0, 1), S > 0, and a £ (0, 1), 

g t (x | s) = g(x) 

= {\b)- 1 f{x/\b)I{Q <x< Xbu} 



<(jb) lU V([tf/A&]^)/{z > Xbu 1+5 } + f(x)I{x < 0}, 



{l + 5)\b\\b; 

g n (x | s) = g{x)I{s < b - 6(1 - a)"- 1 } + f(x)I{s > b - b(l - a)^ 1 }. 

The density gi is based on the following sampling procedure. To generate a sample 
X from gi, first generate a candidate X' from /. If X' < put X — X' , if 
< X' < u, put X = A6X', and if X' > u put X = Xb(X') 1+s . 

Similar to the scaling mixture I algorithm it follows that, for 1 < i < n, 



< x < \u} + ^ +a m*/^) I{x > Xu ^ } 

aX (l + 5)aX^+z 
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uniformly for x > 1 — s and s. Since 



is bounded from below for x > 1 — s (|3 - 2[) holds. Just as for the scaling mixture I 
algorithm (|3.3p also holds. (|3.5|) hold with 

, , , . a 2 X (l + S)a 2 X T ^ 

hi{y | s) - 



y 2a+2 f(y/X) y 2a+ Th+if([y/x}Thy 

The asymptotic upper bound for the normalized second moment is hence 



/ n i—1 
R(y)dm = ~ 



3= 
Xu 



with a n = 1. As above \ ^7 i — TT! n — is minimized at 

Pi = 1 r, 9t = 1 - Pi, 

n — i + 1 

with minimum equal to 1. The remaining parameters A and u can be chosen to 
control the integrals in (|4.4|) . 



5. Achieving asymptotically optimal relative error 

In the previous section we observed that the conditional mixture algorithm and 
the GPD mixture algorithm can be designed to have almost asymptotically optimal 
relative error. A small asymptotic relative error is obtained by choosing the param- 
eter a close to 1. In this section we prove that these algorithms have asymptotically 
optimal relative error. This is accomplished by letting the parameter a depend on 
the threshold b in such a way that a — ► 1 slowly as b — > oo. For simplicity, we 
assume that X% > throughout this section. 

Theorem 5.1. Let v h n be the measure defined by the conditional mixture algorithm. 
Let pi — ra "yf 1 , (ft = 1 — Pi, and assume that 1 — a = 1 — 0(6 2<n-i) + 5 ) j or 
some < 5 < 2 (n-\) ' Then, the conditional mixture algorithm has asymptotically 
optimal relative error for computing pb- That is, 

lim ^ = 1. 

Remark 5.2. In Theorem 15. II the conditional mixture algorithm can be replaced by 
the GPD mixture algorithm. 

Proof. First rewrite the normalized second moment as in (|3.1[) : 

^ = h J W)^ {by)mbidy) ^ ^ j Mv)m b (dy). 

S n >l S„>1 
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Fix ao 6 (0, 1). Define the sets 

Ai = {y e R n : y l > 1 - «<-]., iy < 1 - Sj-i, j < «}> 
B l = {ye R" : y 4 < a (l - s;-i)}, 
Q = {t/e R" : a (l - Si_i) < < a(l - Sj-i)}, 
A = {y e R n : a(l - s,_i) < j/< < 1 — s<-i}. 

Then {s„ > 1} C U^A, and each Ai can be written as the disjoint union of the 
3 4_1 sets of the form 

hnhn-'-nii^nAi, (5.1) 

where each Ij is either Bj , Cj , or Dj . Each intersection (|5.1[) is of one of the types 
below. 

(i) Ij = Bj for each j = 1, . . . ,i — 1. 

(ii) among the sets Zi, . . . , Zj_i there is at least one j for which Ij = Cj and no 
j with Zj = Dj. 

(hi) among the sets Zi, . . . , Zj_i there is at least one j for which Ij = Dj. 
Next we treat the integrals 



Rb{y)m b (dy), 
hn-'-nii-inAi 

separately. The intersection belongs to one of the three types. 

Type (i): Consider y G B t n • • • H n A,. Then s<_i < 1 - (1 - ao) !_1 and 

Fix arbitrary s > 0. Then, for 6 sufficiently large, a b > 1 — e and the expression in 
the last display is bounded above by 

ft' ft F(6) 

It follows that Rl(y) -> JI^i ^J^C 1 ~ e ) _Q ( 1 - Si-i)~ Q uniformly on S x n • • • n 



Bi-i n and then it follows by the arguments in the proof of Theorem 13.21 that 

II 1 T-r 1 1 

limsup— / R b (y)m b (dy) < —~ TT (l-e) _Q . 

b^oo n 2 J n 2 ax ^ gi 

Bin-'-nSi-inAi 

Since e > was arbitrary we can let e — > to get 

i /■ ^ i i 
limsup^- / R b (y)m b (dy) < TT . 



,-i p J 
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Type (ii): For y € h n h n • • • n h-\ n A, it holds that s;_i < 1 - (1 - a)* -1 . 
Proceeding as in the Type (i) case, for e > and b sufficiently large, 

i-i 1 j 

< n - x 



7=1 ^ ftf(6) + ft 



P(fc) 

F(6a(l- Sj _i)) 



i-l 



1 1 F(6o(l - Si-i)) 



<TT 



1 1 F(6(l-e)(l-a) J - 1 ) 



It follows that 



\ P 3 q l F(b) 
^2 J Rb(y)m b (dy) 



n 

Jin— nii-inA 
i-l 



1 W 1 1 F(b(l-e)(l-aY~ 1 ) , , . 

<— II — — ife — — V/in-nVinij. 5.2 

n J=i ft 9i F(b) 

Let k be the first index between 1 and i — 1 such that = Cfc. Then = for 
each 1 < I < k — 1, and sj;_i < 1 — (1 — ao) fc_1 , whereas Si_i < 1 — (1 — a) 1 ^ 1 , 

P(y fc > 6ao(l - Sfc_i), K< > 6(1 - «<_!)) 



m 6 (/in---n/i_inA i ) < 



< 



F(6) 

P(F fc > 6o (l - a ) fc - 1 , Yj > 6(1 - a)'- 1 ) 
F(6) 

< Fibaoil-aoY^Fibil-ay- 1 ) 
F(b) 



Putting this into (|5.2[) yields the upper bound 

^2 / Rb{y)m b (dy) 

Jin—n/j-inAj 



i-i 



< 



^n- r-; x '" r > -"-^) 



5 

i-l 



F(b) 



< 



,2 



71 j=l ft « 



F(6) 



This converges to as 6 — * oo by the choice of a = a b - 

Type (iii): For ^ fl ^ n ■ ■ • (1 ij_i n of type (iii) we let j denote the first index 
for which Ij = Dj. Suppose first that Ik = for each k = 1, . . . ,j — 1. Then, 
Sj_i < 1 — (1 — ao)- 7-1 and, for arbitrary e > and 6 sufficiently large, 
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which is bounded in b. In addition, 



m&(-Bi n • • • n n d 3 ) < 



I'O 'j < D..S, , < 1 - (1 - ao)^- 1 ) 



< 



F(6) 



M«(dy) -* 0, 



(5.5) 



as 6 - 
that 



Bin---nBj_i 

oo, by the bounded convergence theorem. Combining (|5.4[) and (|5.5[) we see 



lim sup 

b — >oo 



Rhdrrih = 0. 



Birv-nB,- 



Finally, suppose = Ck for some fe = 1, . . . ,j — 1. Then, Sj-i < 1 — (1 — a) J 1 
and, for arbitrary s > and b sufficiently large, 

^)<gii»-«)a-n 



F(&) 



(5.6) 



In addition, just as in (|5.3|) . 

m b (/i n ■ ■ • n n zx,) < 



F{ba {\ - a o y- l )F{b{l - - a)''" 1 ) 



F(6) 



(5.7) 



Combining (|5.6p and (|5.7p we see that 
lim sup / Rbdmi, 

b — >oo J 



Sin---nBj_i 

3-1 



t-t 1 1 rF(&(l-e)(l-aV'- 1 )i 2 
<limsup_[J_ 



by the choice of a = a&. 



F(6) 



>(6ao(l-ao) j_1 ) = 0, 



□ 



6. Numerical illustrations 



In this section we examine the performance of the scaling mixture algorithm 
referred to as the SM algorithm. We perform a preliminary test using Pareto- 
distributed positive random variables and compa re the algorithm with the condi- 
tional mixture algorithm in iDupuis et alJ (|2007h . which w e refer to as the DLW 



algori thm, and the conditional Monte Carlo algorithm in lAsmussen an d Kroese 



( 20061) . For comparision, we first consider the same setting as in Dupuis et al 



(|2007l ). Table IV, pp. 18. The so-called true value in Table Q] was obtained from 
the same table. Each estimate was calculated using TV = 10 4 samples of S n . This 
estimation was repeated 100 times and the mean estimate, the mean standard er- 
ror and the mean calculation time were calculated. The parameter a in the DLW 
algorithm was chosen equal to 0.999 and the parameter A in the scaling mixture 
algorithm was chosen equal to 1 in Table [T] and equal to y/3, the optimal value, in 
Table H 

The standard Monte Carlo estimation is inferior to both importance sampling 
algorithms. The conditional Monte Carlo algorithm performs best for most proba- 
bilites in this study. 
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Table 1. Simulations of P(S n > b), where S n — X)"=i X i an(1 
P(X a > x) = (1 + x)~ 1 /2 j a = 0.999 and A = 1. N = 10 4 samples 
were used for each estimation, repeated 100 times.. 



n 


b 


True value 


MIS 


DLW 


CMC 


MC 




5 


5o+05 


0.007071 


0.0070744 
(7.26e-05) 
[0.816] 


0.0070714 
(6.10e-06) 
[0.799] 


0.00707034 
(4.89e-06) 
[0.731] 


0.0069960 
(4.88e-05) 
[0.685] 


Avg. est. 
(A. std. err.) 
[A. time (s)] 




5e+ll 


7.0711c-06 


7.0776c-06 
(7.53e-08) 
[1.005] 


7. 0710o-06 
(1.86e-09) 
[0.990] 


7. 07110-06 
(2.710-11) 
[0.908] 


1. 8000o-05 
(1.56o-05) 
[0.840] 




15 


5o+05 


0.02121 


0.021188 
(2.07e-04) 
[1.224] 


0.021215 
(4.15e-05) 
[1.219 ] 


0.021210 
(2.72o-05) 
[1.092] 


0.021724 
(2.05e-03) 
[1.006] 






5e+ll 


2.1213e-05 


2.1224e-05 
(2.25C-07) 
[1.450] 


2.1214o-05 
(5.82e-09) 
[1.456] 


2.1213c-05 
(3.09c-10) 
[1.283] 


1.800o-05 
(1.80e-05) 
[1.179] 




25 


5o+05 


0.035339 


0.035330 
(3.32o-04) 
[1.712] 


0.035348 
(9.06e-05) 
[1.729] 


0.035347 
(5.89e-05) 
[1.478] 


0.035462 
(2.61e-03) 
[1.366] 






5e+ll 


3.5355c-05 


3.5338o-05 
(3.77o-07) 
[1.993] 


3.5355o-05 
(1.04e-09) 
[2.016] 


3.5355c-05 
(1.32e-09) 
[1.689] 


3. 8000o-05 
(3.68e-05) 
[1.559] 





Table 2. Simulations of P(S n > b), where S n = Yh=i x u 
P(X 1 > x) = (1 + a = 0.999 and A = \/3. N = 10 4 

samples were used for each estimation, repeated 100 times. 



n 


b 


True value 


MIS 


DLW 


CMC 


MC 




5 


5e+05 


1.0001e-05 


1.0020e-05 
(1.07c-07) 
[0.429] 


1.0001e-05 
(2.78e-09) 
[0.415] 


1.0001e-05 
(2.58e-10) 
[0.433] 


6.OOO0-O6 
(6.OO0-6) 
[0.346] 


Avg. est. 
(std. err.) 
[time (s)] 




5c+ll 


1.0000e-13 


9.9996c-12 
(1.07c-13) 
[0.433] 


9.9999o-12 
(2.79c-15) 
[0.418] 


1.0000e-13 
(8.59c-22) 
[0.430] 




(0) 
[0.352] 




15 


5c+05 


3.0010c-05 


3.0004e-05 
(3.21e-07) 
[0.491] 


3.0011O-05 
(1.12e-08) 
[0.445] 


3.0010o-05 
(1.74e-09) 
[0.437] 


3.0000c-05 
(2.71e-05) 
[0.375] 






5c+ll 


3.0000e-ll 


2.9990O-11 
(3.22e-13) 
[0.490] 


3.0000O-11 
(9.06O-15) 
[0.445] 


3.0000O-11 
(1.75e-20) 
[0.431] 




(0) 
[0.365] 




25 


5c+05 


5.0029c-05 


5.0098c-05 
(5.37o-07) 
[0.561] 


5.00274c-05 
(1.90e-08) 
[0.485] 


5.00290o-05 
(4.10e-09) 
[0.432] 


3.7000o-05 
(3.34e-05) 
[0.386] 






5e+ll 


5.0000e-ll 


4. 99700-11 
(5.38e-13) 
[0.556] 


4.99980-11 
(1.65o-14) 
[0.479] 


5.0000O-11 
(1.54e-20) 
[0.439] 




(0) 
[0.382] 
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